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A CHARACTERIZATION OF BEST UNBIASED ESTIMATORS 


IOSIF PINELIS 


Abstract. A simple characterization of uniformly minimum variance unbi¬ 
ased estimators (UMVUEs) is provided (in the case when the sample space is 
finite) in terms of a linear independence condition on the likelihood functions 
corresponding to the possible samples. The crucial observation in the proof is 
that, if a UMVUE exists, then, after an appropriate cleaning of the parame¬ 
ter space, the nonzero likelihood functions are eigenvectors of an “artificial” 
matrix of Lagrange multipliers, and the values of the UMVUE are eigenval¬ 
ues of that matrix. The characterization is then extended to best unbiased 
estimators with respect to arbitrary convex loss functions. 


Let (A, E) be a measurable space, so that A is a set and E is a sigma-algebra 
of subsets of A. The set A is to be interpreted as the set of all possible statistical 
samples and will be assumed nonempty. Any mapping of A to R that is measurable 
with respect to the sigma-algebra E over A and the Borel sigma-algebra over R is 
called a (real-valued) statistic or, equivalently, a (real-valued) estimator. 

Let 3^ := (Pe)ege be a family of probability measures on E, where 0 is a 
nonempty set, called the parameter space. The triple (A, E, J^) is called a statistical 
model. Let us say that a set A G E is a null set (for the model) if Pg{N) = 0 for 
all 6» G 0. 

For each 0 G 0, let Eg denote the expectation with respect to the probability 
measure Pg. For j = 1, 2, let = L^{X, E, 3^) stand for the set of all statistics T 
such that Eg jTp < oo for all 0 G 0. 

Let b be any function from 0 to R. A statistic T G is called unbiased for the 
function b if 

(1) EgT = 6(0) forall0G0; 

on the other hand, for a given statistic T, the function b satisfying m may be called 
the expectation function of T. Let Sg denote the set of all unbiased estimators of 
the function b. In particular, will denote the set of all unbiased estimators of 
the zero function. 

A statistic T G is called a uniformly minimum variance unbiased estimator 
(UMVUE) of the function b if (i) T G and (ii) for any T G Sg and all 0 G 0 one 
has VargT < Varg T or, equivalently, Eg < EgT^. If T is a UMVUE of some 
function b, let us say simply say that T is a UMVUE. 

Let us say that a statistic T is sufficient if for any statistic S G there exists 
a statistic St that is, for each 0 G 0, a version of the conditional expectation 
Eg(S'|T); the key here is that the statistic St is the same for all 0 G 0. This 
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definition is slightly more convenient than, and is easily seen to be equivalent to, 
the usual dehnition of a sufficient statistic; see e.g. [6l top of page 311]. 

A statistic T is called complete if, for any Borel-measurable function u from R 
to K such that uoT € (Sq, one has uoT = 0 except on a null set. 

The Lehmann-Scheffe theorem li Theorem 5.1] is as follows. 

Theorem (Lehmann-Scheffe). Let T be a complete sufficient statistic. Let a 
Borel-measurable function u from R. to M fee such that uoT a L^. Then uoT is a 
UMVUE. 


Throughout the rest of this paper, assume that the set X of ah samples is hnite 
and E is the set of all subsets of X, unless specihed otherwise. Thus, the set of ah 
statistics will be the same as the set R^ of all functions from X to R. 

For each sample x & X ffiei tx stand for the corresponding likelihood function, 
mapping 0 to R and dehned by the formula 

(2) 4(6') := P4{x}) for ah 0 € 0. 

For each t G T{X) := {T(x): x G X}, take any (linear) basis Bt of the set 
{lx - T{x) = t} of likelihood functions; in particular, Bt will be necessarily empty 
if the likelihood functions ix are zero for ah x G X such that T(x) = t. (As usual, 
it is assumed here that the sum of an empty family is zero.) 

Theorem 1. A statistic T is a UMVUE iff the union lJt6T(JC) 6^® bases is 

linearly independent. 

Theorem [T] will be proved at the end of this note. 

Another characterization of UMVUEs was provided by Theorem 5 of Bahadur [I] , 
which implies that there is a sigma-algebra Eq over X such that a statistic T is a 
UMVUE iff T is Eo-measurable. The sigma-algebra Eg can be described (see e.g. 
[5]) as the set of ah subsets of X whose indicator is a UMVUE. It appears that the 
necessary and sufficient linear-independence condition given in Theorem [1] above 
is more explicit and easier to check than the Eg-measurability condition. On the 
other hand, Bahadur’s characterization of UMVUEs holds not only for finite sets 
X of all samples. 

Consider the matrix P := [Pe({a;}): 6' G 0, x G X], so that the rows and columns 
of P represent, respectively, the probability mass functions (say Pg) of the proba¬ 
bility measures Pg on X (for 0 G 0) and the likelihood functions 4 (for x G X). 


Example 2. Suppose that X = {1,2, 3,4} and 0 = {1,2}. 

'1/3 1/3 V3 O' 

[i/e 1/3 1/2 0 

are linearly independent, but of course no three columns of P are so. 


If P = Pi := 


then any two of the first three columns of P 

It follows 

immediately by Theorem [1] that a statistic P is a UMVUE here iff r(l) = T{2) = 
r(3). It then follows that here the sigma-algebra Eg is generated by the set {1, 2, 3}, 
so that 


(3) Eg = {0, {1,2,3}, {4}, {1,2,3,4}}. 

Alternatively, one can hnd the sigma-algebra Eg by using its mentioned descrip¬ 
tion as the set of all subsets of X whose indicator is a UMVUE. It is well-known 
and easy to see that a statistic T is a UMVUE iff for any statistic H one has the 
implication 

(4) H & S'q 


TH G 4) 




BEST UNBIASED ESTIMATORS 


3 


Next, the condition H G So means that the function H from X = {1,2,3,4} to K., 
identified with the row ... ,i?(4)], is in the orthogonal complement, say O, 

of the row space of the matrix P to the set K.^ of all statistics with respect to the 
usual inner product, defined by the formula T ■ S := T(l)5'(l) + • • ■ + T(4)S'(4). 
One finds that, for P = Pi, the orthogonal complement O is the linear span of 
two rows, say Hi := [1, —2,1,0] and H 2 ■= [1, —2,1,1]. Thus, the indicator Ia of 
a subset A of X = (1,2,3,4} is a UMVUE iff {Hj I^i) ■ Pg = 0 for j € (1, 2} and 
0 € 0 = {1,2}. One can then check that the set of all such subsets A is the same 
as the sigma-algebra Eq in ([3]). 

[ 1/2 1/4 V4 o] 

If P = P 2 := S ■!I L , then the first two columns of P are linearly 

[2/3 1/3 0 OJ 

dependent, whereas their linear span is linearly independent of the third column. 
It follows immediately by Theorem [I] that a statistic T is a UMVUE here iff T(l) = 
r(2). It then follows that here the sigma-algebra Eq is generated by the sets {1,2}, 
{3}, {4}, so that Eo = {0, {3}, {4}, {1, 2}, {3,4}, {1, 2,3}, {1, 2,4}, {1, 2, 3,4}}. Sim¬ 
ilarly to the case P = Pi, one can check the latter sigma-algebra Eg coincides with 
the set of all subsets of X whose indicator is a UMVUE. 

At least for these two examples, with P = Pi and P = P 2 , it appears that indeed 
the necessary and sufficient linear-independence condition given in Theorem [T] is 
more explicit and easier to check than the Eg-measurability condition. However, 
it also appears that there is a duality between these two necessary-and-sufficient 
conditions, one in terms of the linear independence of some of the columns of the 
matrix P and the other one expressible in terms of the rows of P. 

Corollary 3. If a statistic T is a UMVUE, then u o T is so for any function 
u: K. —>■ R. 


Proof of Corollary This follows immediately from Theorem [T] because, for any 
s € {u o T){X), there is a basis of the set {l^'- (u o T){x) = s} contained in the 
union lJ{Pt: t G T{X), u(t) = s}. □ 


Corollary 4. Suppose that a statistic T is a UMVUE. Then T is complete. 

Proof of Corollary^ Take any function m: R —)• R such that uoT & Sq. Then, by 
Corollary [S] m o T is a UMVUE of the zero function. On the other hand, the zero 
statistic is clearly a UMVUE of the zero function. So, Ee(u o T)^ = Varg(u oT) = 
Marg 0 = 0 for all 6, whence uoT = 0 except on a null set. Thus, T is complete. □ 


Another relation of the necessary-and-sufficient linear independence condition in 
Theorem [T] with the completeness is presented in the following corollary. 

Corollary 5. Suppose that a statistic T is such that for each t G T{X) the basis 
Bt is a singleton set. Then T is a UMVUE iff T is complete. 

This follows immediately from Theorem [T| 

In the case when a complete sufficient statistic exists, the UMVUEs can be easily 
characterized: 

Proposition 6. Suppose that a statistic S is sufficient and complete. Then a 
statistic T is a UMVUE iff for some function m : R —R one has T = uo S except 
on a null set. 
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Proof of Proposition\^ Given the Lehmann-Scheffe and Rao-Blackwell theorems, 
this proof is easy and presented here for readers’ convenience. Indeed, the “if” part 
of Proposition [B] is the Lehmann-Scheffe theorem itself. 

To prove the “only if” part, take any UMVUE T. Then the statistics Tg := 
Ee(r|S') and U := {T + Ts)/2 are also UMVUE, and EeTs = EgU = EgT for 
all 9 € Q. However, by the Cauchy-Schwarz inequality, Varg U = (VargTs + 
MsrgT + 2 Cove(Ts, T))/4 < Varg Eg = Varg T for some 0 G 0 unless Ts = T 
(except on a null set). It remains to note that for some function m: R —?> R one has 
Ts = Ee(T|5') = uo S except on a null set. □ 

Let us now present two more examples illustrating Theorem [TJ 

Example 7. (Bernoulli trials) Suppose that X = {0,1}” for some natural n, 0 = 
(0,1), and T{x) := X]”=i and Pe({a;}) = 0^(^)(l-0)"-'^U) for x = {xi, ... ,x„) G 
X. Clearly, here for each t G T{X) the basis Bt is a singleton set. Also, the statistic 
T here is sufficient and complete. So, it follows immediately either from Theorem [T] 
(cf. Corollaries [3] and [5]) or from Proposition [6] that a statistic S' is a UMVUE here 
iff for some function u: R —^ R one has S = uoT. 


Example 8. (Beta-Bernoulli trials) Consider the following hierarchical model of n 
independent trials, where the success probability in each trial is a random number 
p sampled from a Beta distribution (p is sampled just once, before the trials begin); 
this is commonly used to model over-dispersion. Fix any positive real number c, 
which can be thought of as somewhat large. Suppose that X = {0,1}" for some 
natural n, 0 = (—c, c), and T{x) := 

P,({x}) = [\^^^\l-pr-^^^^Ue,c-e{p)dp 

Jo 

r(2c) r(c -E 0 -f T(x)) r(c - 0 -E n - T{x)) 

~r{2c + n) r(c-E0) r(c - 0) 

for x = (xi,..., Xn) G X, where fap is the probability density function of the Beta 
distribution with positive real parameters a and (3, so that 


fa,p{p) 


r{a + f3) 

r(a)r(/3) 


p“-i(l 


forp G (0,1). One may note that then, with (a,/3) = (c-E0,c—0), (i) Eg ^ = 

increases from 0 to 1 as 0 increases from —c to c and (ii) the over-dispersion 


Vare^-Vare Ee(^jp) = 


a/3 




< 


of — is small 


(a+/3)^(a+/3+l) n 4c^{2c+l) ^ 4(2c-|-l) 

(uniformly in 0 and n) when c is large. Using a reasoning similar to that in Exam¬ 
ple 0 one comes to the same conclusion as there, that a statistic 5 is a UMVUE 
here iff for some function m: R —5 > R one has S = uoT. To obtain this conclusion, 
in this case one only has to verify that the likelihood functions ix corresponding 
to samples x with pairwise distinct values T(x) are linearly independent. But this 
follows (by the strict total positivity property of the function R^ 9 (x, y) >->■ and 
the Polya-Szego extension of the Cauchy-Binet formula for determinants) from the 
representation ixi9) = Pe({x}) = K(T{x), 0) for all 0 G 0 and x G V, where 
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and n{dp) := p‘^ ^(1 — ^ dp] cf. e.g. the paragraph containing formula (3.6) 

on page 12 in [4] or pages 16-17 in [3]. 

Corollaries [3] and m follow as well from the mentioned Theorem 5 of Bahadur [T], 
which also states that the mentioned sigma-algebra Eq is complete. 

The method by which Theorem [T] and Corollaries [3] and |4] were obtained appears 
to be very different from that in|4j where the main, and ingenious, idea was to apply 
implication (|4l) repeatedly and then use interpolation/approximation properties of 
polynomials. Let us present this idea here to provide the following. 

Alternative proof of Corollary^ Since T is a UMVUE, applying implication (|3]) 
repeatedly yields = T{TH) G Sq, T^H = T{T'^H) G and so on, for any 
H G <^o- Since <^o is a linear space, it follows that {f oT)H G tfo for any polynomial 
/ over R and any H G S’q. Moreover, since the set X is finite, for any function 
u: X ^ R there is a polynomial / over R such that uoT = f oT. So, 

(5) {u o T)H G ^0 for any function u: X and any H G S’q. 

In particular, if itoT G S'o, then (uoT)^ G Sq, that is, Eg(uoT)^ = 0 for all 0 G 0, 
whence uoT = 0 except on a null set. Thus, T is complete. □ 

It is not clear to me if the method of [1] can be used to obtain Theorem [1] of the 
present note. 

The notion of the UMVUE, which is optimal with respect to a quadratic loss 
function, was extended to more general loss functions £: 0 x R —)> R. A loss 
function £ is called convex if the function R 9 f i—>■ £(0, t) G R is convex for each 
0 G 0. A statistic T is called a uniformly best unbiased estimator with respect to 
a loss function £ (£-UBUE) if for any statistic S such that S — T G S’o one has 
Ee£(0,T) < E6 i£( 0, S') for all 0 G 0. Obviously, if 2,{6,t) = {t — b{6)Y for some 
function 6: 0 —?> R and all 0 G 0 and t G R, then any UMVUE of b is an £-UBUE. 
A statistic T is called universally uniformly best unbiased estimator (UUBUE) if it 
is £-UBUE for all convex loss functions £. 

If the consideration is reduced only to statistics T with a given expectation 
function b, then of interest may be the set, say of all loss functions of the 
form £c, where c is any differentiable strictly convex function from R to R and 
£,c{9,t) = c{t) for all {0,t) G 0 x R, so that £,c{9,t) does not depend on 9. The 
set of loss functions was considered in [8]. It is easy to see that a statistic T 
is a UMVUE iff T is an £sq-UBUE, where sq is the square function given by the 
formula sq(t) = for t G R; also, clearly £ 5 q G 

Proposition 9. Take any statistic T and any loss function £ G Then T is a 
UMVUE iffT is an Z-UBUE iffT is UUBUE. 

Proposition [9] is known (see e.g. [5] and [8]) and is based mainly on the presented 
above argument by Bahadur. For readers’ convenience, here is 

Proof of Proposition^^ Take any UMVUE T. By ([S]), for any H G one has 
Ee(77|T) = 0 except on a null set. Take any statistic S such that H := S — T G Sq. 
Then E6i(S'|T) = T except on a null set. So, for any convex loss function £ and for all 
9 G Q one has Ee£,{9,T) = Eg £(0, E6i(S'|r)) < Eg £(0,5), by Jensen’s inequality. 
Thus, any UMVUE T is a UUBUE. 
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That any £-UBUE (for any £ = £c G ‘^) is a UUBUE is proved similarly. Here 
(cf. O Proof of Satz 2]) the most significant difference is that instead of ([4]) one 
repeatedly uses the implication H G (ao (c' o T)H G Sq, together with the 

fact that the derivative c' of the differentiable strictly convex function c is a strictly 
increasing, and hence injective, function. 

That any UUBUE is an £-UBUE (for any £ G ‘^) and, in particular, is a UMVUE 
is trivial. □ 

Remark 10. In view of Proposition [9l one can replace the term UMVUE in The¬ 
orem [1] and Corollaries |3] and 2] either by UUBUE or by £-UBUE for any given 
£ G 

Proof of Theorem[J\ Let us begin with some cleaning of the set 0. Without loss of 
generality (w.l.o.g.), the parameter space 0 is finite and the family = {Pg)g^Q 
of probability measures is linearly independent. Indeed, otherwise one can replace 
the family ^ by any linear basis (P6»)eGeo of for some 0o C 0. Then 0o will 
be nonempty and finite, since the set 0 is nonempty and the set X is finite. It is 
not hard to see that this replacement of (P6i)6(ee by (P6»)eGeo not affect either 
the UMVUE property or the linear independence. 

Let us now proceed to the proof of the “only if” part of Theorem [TJ The crucial 
observation here is that, if a UMVUE exists, then, after the mentioned cleaning of 
the parameter space 0, the likelihood functions ix corresponding to the possible 
samples are eigenvectors of an “artificial” matrix of Lagrange multipliers, and the 
values of the UMVUE are eigenvalues of that matrix. 

Let indeed T be a UMVUE of a function 6: 0 —^ K.. For x £ X and d G 0, 
introduce the abbreviations 

tx-.= T{x) and pg,x ■= 

Then 

(6) Eg ^ t^pg^x for all 0 G 0 and j = 1, 2. 

xGX 

Fix for a moment any 6* G 0. Then, because T is a UMVUE of b and in view of 
(ini), the family (tx)xex of the values of T on V is a minimizer of 

^ ' 2 Ps,x 

x^X 

over all families {tx)xGX in R such that 

ixPT,x = bij) for all r G 0. 

x^X 

Therefore and because the family (pT-.OxGe is linearly independent, by the Euler- 
Lagrange multiplier rule (see e.g. [3 page 441]), there exist Lagrange multipliers 
Xg^T G M (r G 0) such that 

ixPg^x — ^ ^ Xg^xPr,xt 
rG0 

for all X £ X. Now unfix d G 0. 

Then the system of equations o can be rewritten in matrix form: 

Mx = tx ix for all X £ X, 


( 8 ) 
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where A := [A6i,r: 0 S 0 ,t G 0]. Recall that, for each x € X, £x is the correspond¬ 
ing likelihood function, mapping the finite (after the cleaning) set 0 to R, and ^x 
is identified with the corresponding column. Thus, (|S]) means precisely that, for 
each X & X with the corresponding nonzero likelihood function txi (i) the column 
(.X is an eigenvector of the square matrix A of Lagrange multipliers and (ii) the 
value tx = T{x) of the UMVUE T on cc is the corresponding eigenvalue of A; this 
is the mentioned key observation in the proof. Now, to complete the proof of the 
“only if” part of Theorem [TJ it remains to recall that any family of eigenvectors of 
a matrix corresponding to pairwise distinct eigenvalues is linearly independent. 

Let us now turn to the proof of the “if” part. Accordingly, suppose that the the 
union IJtgT(x) bases is linearly independent. For each t G T{X), let Vt 

denote the linear span of the basis Bt, so that V) is a linear subspace of the linear 
space R® of all functions from 0 into R. Let U := X]ieT(x) U is a linear 

subspace of R®. Let W be any linear subspace of R® that complements U to R®; 
that is, W is such that U + W = R® and U ClW = {0}. Then each vector v G R® 
can be uniquely represented in the form w + X]tGT(x) where w €W and Vt € Vt 
for alH G T{X). Thus, one has a valid definition of a linear operator M : R® —)• R® 
by the formula 

(9) M^u> + E ”■) - E tVt- 

t^T(X) tGT(X) 

Then, in particular, Mix = txix for all x G X, where tx '■= T(x), as before. So, 
letting A be the matrix of the linear operator M, one has ([5]). By the last sentence 
of the Corollary on page 440 of [7] to the “Convex Multiplier Rule”, it now follows 
that T is a UMVUE. This completes the proof of the “if” part and hence that of 
the entire theorem. □ 

Acknowledgment. I am pleased to thank Lutz Mattner for pointing out that 
Corollary!?] is known and drawing my attention to paper [5] and references therein. 
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